Exercise

The Atlas of Living Australia data for the area around Monash’s Clayton campus, Melbourne has been downloaded for you.

monash <- read_csv("data/monash_species.csv")
  1. (1pt) What was the earliest and latest dates of wildlife being sighted in the data provided?
 Event Date - parsed 
 Min.   :1898-07-14  
 1st Qu.:2014-08-07  
 Median :2016-04-27  
 Mean   :2014-12-17  
 3rd Qu.:2018-04-07  
 Max.   :2018-08-30  
 NA's   :5           
  1. (1pt) How many different species have been sighted? 316
# A tibble: 316 x 2
   `Scientific Name`                             n
   <chr>                                     <int>
 1 Cracticus tibicen                           861
 2 Manorina (Myzantha) melanocephala           704
 3 Trichoglossus haematodus                    630
 4 Glossopsitta concinna                       561
 5 Grallina cyanoleuca                         533
 6 Corvus mellori                              518
 7 Anthochaera carunculata                     500
 8 Gallinula (Gallinula) tenebrosa tenebrosa   361
 9 Hirundo neoxena                             332
10 Anas (Anas) superciliosa                    330
# ... with 306 more rows
  1. (1pt) What species is the most commonly sighted around Monash?

Cracticus tibicen which is the Australian magpie.

  1. It’s a bit surprising to see sightings dated in the 1800s. Let’s look at these in more detail.

    1. (1pt) Count the number of sightings by day, and make a plot of count by date. What year do you think Monash University was established (based on this data)? Why? Check your guess using google. Around 1960, because this is when sightings begin to be recorded.
    2. (1pt) Subset the measurements recorded prior to 1950. Who was the collector? RAOU VIC historical observer
    3. (1pt) Of these historically observed species, are any not seen around campus any more? There are 10 specis that are not commonly seen any more, including the budgerigar.

# A tibble: 10 x 2
   `Vernacular name`            n
   <chr>                    <int>
 1 Australasian Bittern         1
 2 Australian Pipit             1
 3 Budgerigar                   1
 4 Fork-tailed swift            1
 5 Regent Honeyeater            1
 6 Shining Bronze-cuckoo        4
 7 White-browed Woodswallow     1
 8 White-winged Triller         1
 9 Yellow-tail                  1
10 <NA>                         1
  1. (1pt) We are going to create a subset, especially for you to analyse, of a random sample of 4 of the species commonly seen in recent years. Using the code provided do the following:

Subset the data to species seen after 1950. Count the number of sightings of each species. Randomly sample 4 from the ones that have been sighted at least 100 times. List the Scientific names of your four species.

myspecies <- monash %>%
  filter(year(`Event Date - parsed`) >= 1950) %>%
  count(`Scientific Name`, sort=TRUE) %>% 
  filter(n > 100) %>%
  sample_n(4) 
mysample <- monash %>%
  filter(year(`Event Date - parsed`) >= 1950) %>%
  filter(`Scientific Name` %in% myspecies$`Scientific Name`)
# My four
> myspecies
# A tibble: 4 x 2
  `Scientific Name`                       n
  <chr>                               <int>
1 Anser                                 175
2 Cracticus torquatus                   307
3 Cracticus tibicen                     861
4 Streptopelia (Spilopelia) chinensis   302
  1. (2 pts) Make a map of campus, and plot the locations of species sightings, coloured and faceted by the different species. Write a few sentences describing the distribution of the species - use Vernacular name for any that have one.

  1. (2 pts) Aggregate the sightings for each species, by month. Make a plot of number of sightings by month. Write a sentence or two discussing the relative frequency of sightings by month of the year.

The magpie is spotted more in nesting season, Aug-Nov, but the other three of mine are seen routinely throughout the year.

  1. (2 pts) Aggregate by hour of the day. Make a line plot of frequency of sighting by hour. What are the most common times of day to see these species?

During the day, starting around 9-10, tapering off in the afternoon. This is most likely following human patterns, that the species are reported during the work day.

  1. (3 pts) Find the species description on wikipedia. Read in the text descriptions for each of your species, using web scraping (example code is below). Conduct a text analysis to determine which words most distinguish the different between the four species.

We can learn a lot about the species based on the differentiated keywords. For example, “Anser” are geese, and are likely an introduced species because “America” pops up. Similarly, the spotted dove is an introduced species because India and Hawaii pop up. Magpies could be dangerous (as we know) because the words “attack” and “aggressive” pop out. And the bucherbird is known for its “singing”.

  1. (2 pts) Now expand your subset again, to include the 25 most common species.
    1. Compute the frequency of sighting by hour of the day.
    2. Standardize the hourly counts for each species by dividing by the maximum counts. (This will put the counts for each species in the range 0 to 1, that is, it converts them to proportion of sightings occurring each hour.)
    3. Spread the data to have hour in the columns, and species in the rows, and the proportion in the cells.
    4. Compute the Euclidean distance between species, that is the distance between proportions in each hour.
    5. Convert distances to a binary matrix, and use this to produce a network map of the species. This indicates which species are more commonly seen at similar times of the day.

Grading

Two points reserved for easy to compile, spell-checked, nicely turned in work.